New kernel methods for phenotype prediction from genotype data.

نویسندگان

  • Ritsuko Onuki
  • Tetsuo Shibuya
  • Minoru Kanehisa
چکیده

Phenotype prediction from genotype data is one of the most important issues in computational genetics. In this work, we propose a new kernel (i.e., an SVM: Support Vector Machine) method for phenotype prediction from genotype data. In our method, we first infer multiple suboptimal haplotype candidates from each genotype by using the HMM (Hidden Markov Model), and the kernel matrix is computed based on the predicted haplotype candidates and their emission probabilities from the HMM. We validated the performance of our method through experiments on several datasets: One is an artificially constructed dataset via a program GeneArtisan, others are a real dataset of the NAT2 gene from the international HapMap project, and a real dataset of genotypes of diseased individuals. The experiments show that our method is superior to ordinary naive kernel methods (i.e., not based on haplotype prediction), especially in cases of strong LD (linkage disequilibrium).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search

In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...

متن کامل

Prediction of Phenotype Information from Genotype Data

The dissection of complex diseases is one of the greatest challenges of human genetics with important clinical and scientific applications. Traditionally, associations were sought between single genetic markers and disease. The availability of large scale SNP data makes it possible, for the first time, to study the predictive power of genotypes and haplotypes with respect to phenotype data. Her...

متن کامل

Some New Methods for Prediction of Time Series by Wavelets

Extended Abstract. Forecasting is one of the most important purposes of time series analysis. For many years, classical methods were used for this aim. But these methods do not give good performance results for real time series due to non-linearity and non-stationarity of these data sets. On one hand, most of real world time series data display a time-varying second order structure. On th...

متن کامل

Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.

Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a commo...

متن کامل

Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir

The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2010